6 research outputs found
Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has
been recently extended to exploit prior knowledge on the cardinality of each
cluster. Such knowledge is used to increase performance as well as solution
quality. In this paper, we propose a global optimization approach based on the
branch-and-cut technique to solve the cardinality-constrained MSSC. For the
lower bound routine, we use the semidefinite programming (SDP) relaxation
recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239,
(2019)]. However, this relaxation can be used in a branch-and-cut method only
for small-size instances. Therefore, we derive a new SDP relaxation that scales
better with the instance size and the number of clusters. In both cases, we
strengthen the bound by adding polyhedral cuts. Benefiting from a tailored
branching strategy which enforces pairwise constraints, we reduce the
complexity of the problems arising in the children nodes. For the upper bound,
instead, we present a local search procedure that exploits the solution of the
SDP relaxation solved at each node. Computational results show that the
proposed algorithm globally solves, for the first time, real-world instances of
size 10 times larger than those solved by state-of-the-art exact methods
A machine learning approach for forecasting hierarchical time series
In this paper, we propose a machine learning approach for forecasting
hierarchical time series. When dealing with hierarchical time series, apart
from generating accurate forecasts, one needs to select a suitable method for
producing reconciled forecasts. Forecast reconciliation is the process of
adjusting forecasts to make them coherent across the hierarchy. In literature,
coherence is often enforced by using a post-processing technique on the base
forecasts produced by suitable time series forecasting methods. On the
contrary, our idea is to use a deep neural network to directly produce accurate
and reconciled forecasts. We exploit the ability of a deep neural network to
extract information capturing the structure of the hierarchy. We impose the
reconciliation at training time by minimizing a customized loss function. In
many practical applications, besides time series data, hierarchical time series
include explanatory variables that are beneficial for increasing the
forecasting accuracy. Exploiting this further information, our approach links
the relationship between time series features extracted at any level of the
hierarchy and the explanatory variables into an end-to-end neural network
providing accurate and reconciled point forecasts. The effectiveness of the
approach is validated on three real-world datasets, where our method
outperforms state-of-the-art competitors in hierarchical forecasting
An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is
traditionally considered an unsupervised learning task. In recent years, the
use of background knowledge to improve the cluster quality and promote
interpretability of the clustering process has become a hot research topic at
the intersection of mathematical optimization and machine learning research.
The problem of taking advantage of background information in data clustering is
called semi-supervised or constrained clustering. In this paper, we present a
branch-and-cut algorithm for semi-supervised MSSC, where background knowledge
is incorporated as pairwise must-link and cannot-link constraints. For the
lower bound procedure, we solve the semidefinite programming relaxation of the
MSSC discrete optimization model, and we use a cutting-plane procedure for
strengthening the bound. For the upper bound, instead, by using integer
programming tools, we use an adaptation of the k-means algorithm to the
constrained case. For the first time, the proposed global optimization
algorithm efficiently manages to solve real-world instances up to 800 data
points with different combinations of must-link and cannot-link constraints and
with a generic number of features. This problem size is about four times larger
than the one of the instances solved by state-of-the-art exact algorithms
SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering
The minimum sum-of-squares clustering problem (MSSC) consists of partitioning
n observations into k clusters in order to minimize the sum of squared distances
from the points to the centroid of their cluster. In this paper, we propose an exact
algorithm for the MSSC problem based on the branch-and-bound technique. The lower
bound is computed by using a cutting-plane procedure in which valid inequalities are
iteratively added to the Peng–Wei semidefinite programming (SDP) relaxation. The
upper bound is computed with the constrained version of k-means in which the initial
centroids are extracted from the solution of the SDP relaxation. In the branch-and bound
procedure, we incorporate instance-level must-link and cannot-link constraints
to express knowledge about which data points should or should not be grouped
together. We manage to reduce the size of the problem at each level, preserving the
structure of the SDP problem itself. To the best of our knowledge, the obtained results
show that the approach allows us to successfully solve, for the first time, real-world
instances up to 4,000 data points